In [1]:
import numpy as np
import pandas as pd
In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
In [3]:
data = pd.read_csv('/Users/danielluo/Downloads/crime-in-vancouver/crime.csv')
In [4]:
data.head()
Out[4]:
TYPE YEAR MONTH DAY HOUR MINUTE HUNDRED_BLOCK NEIGHBOURHOOD X Y Latitude Longitude
0 Other Theft 2003 5 12 16.0 15.0 9XX TERMINAL AVE Strathcona 493906.5 5457452.47 49.269802 -123.083763
1 Other Theft 2003 5 7 15.0 20.0 9XX TERMINAL AVE Strathcona 493906.5 5457452.47 49.269802 -123.083763
2 Other Theft 2003 4 23 16.0 40.0 9XX TERMINAL AVE Strathcona 493906.5 5457452.47 49.269802 -123.083763
3 Other Theft 2003 4 20 11.0 15.0 9XX TERMINAL AVE Strathcona 493906.5 5457452.47 49.269802 -123.083763
4 Other Theft 2003 4 12 17.0 45.0 9XX TERMINAL AVE Strathcona 493906.5 5457452.47 49.269802 -123.083763
In [5]:
pd.unique(data.TYPE)
Out[5]:
array(['Other Theft', 'Break and Enter Residential/Other', 'Mischief',
       'Break and Enter Commercial', 'Offence Against a Person',
       'Theft from Vehicle',
       'Vehicle Collision or Pedestrian Struck (with Injury)',
       'Vehicle Collision or Pedestrian Struck (with Fatality)',
       'Theft of Vehicle', 'Homicide', 'Theft of Bicycle'], dtype=object)
In [6]:
data[data.TYPE == 'Homicide']
Out[6]:
TYPE YEAR MONTH DAY HOUR MINUTE HUNDRED_BLOCK NEIGHBOURHOOD X Y Latitude Longitude
15380 Homicide 2003 10 16 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.0 0.0 0.0 0.0
15402 Homicide 2003 3 14 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.0 0.0 0.0 0.0
15520 Homicide 2003 11 4 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.0 0.0 0.0 0.0
15593 Homicide 2003 7 26 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.0 0.0 0.0 0.0
15658 Homicide 2003 8 16 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ...
525609 Homicide 2017 4 15 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.0 0.0 0.0 0.0
525619 Homicide 2017 6 27 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.0 0.0 0.0 0.0
525623 Homicide 2017 5 23 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.0 0.0 0.0 0.0
525630 Homicide 2017 1 27 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.0 0.0 0.0 0.0
525674 Homicide 2017 3 14 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.0 0.0 0.0 0.0

220 rows × 12 columns

Lets create a pivot table of neighborhood count of each crime. Note that for Homicides and Personal Offences, location is not recorded, so it isn't included in the following pivot tables.

In [7]:
data['counter'] = 1

Crime by Neighbourhood

In [8]:
neighborhood_crime = pd.pivot_table(data, values = 'counter', index = 'NEIGHBOURHOOD', columns = 'TYPE', aggfunc='count')
neighborhood_crime['All Crime Count'] = neighborhood_crime.sum(axis=1)
neighborhood_crime.sort_values('All Crime Count')[::-1]
Out[8]:
TYPE Break and Enter Commercial Break and Enter Residential/Other Mischief Other Theft Theft from Vehicle Theft of Bicycle Theft of Vehicle Vehicle Collision or Pedestrian Struck (with Fatality) Vehicle Collision or Pedestrian Struck (with Injury) All Crime Count
NEIGHBOURHOOD
Central Business District 9371 3505 16672 19244 48003 6907 4016 41 3188 110947
West End 2775 3480 5325 6033 16904 2985 2660 6 1184 41352
Fairview 3303 3834 3196 3269 11934 3394 2037 14 1180 32161
Mount Pleasant 2769 3278 4070 3698 9679 2746 2654 18 1624 30536
Grandview-Woodland 2082 4515 4970 2508 7342 1403 3111 9 1240 27180
Renfrew-Collingwood 1197 4296 3886 4119 8420 419 3011 12 1401 26761
Kitsilano 2092 4390 3692 1730 8912 2464 2366 13 1040 26699
Kensington-Cedar Cottage 1277 4136 3760 2961 7474 859 2919 11 1544 24941
Strathcona 2168 2019 4556 994 7343 1015 1650 20 1154 20919
Hastings-Sunrise 929 3199 2904 1379 5654 321 2452 18 1270 18126
Sunset 1105 2578 3243 1401 5226 255 2275 17 1296 17396
Marpole 1098 2527 1905 612 4151 232 1617 10 931 13083
Riley Park 848 2706 1795 410 4269 621 1197 4 671 12521
Victoria-Fraserview 386 2499 1761 483 3390 132 1372 10 786 10819
Killarney 302 2130 1761 245 3990 163 1302 12 570 10475
Oakridge 332 2089 889 1176 2290 172 669 6 414 8037
Dunbar-Southlands 294 1847 1324 241 2899 240 629 3 269 7746
Kerrisdale 326 1826 1049 265 2805 179 547 7 443 7447
Arbutus Ridge 325 1672 934 337 1852 160 498 3 285 6066
West Point Grey 331 1299 879 260 1971 372 450 4 305 5871
Shaughnessy 129 1774 633 25 1769 139 371 7 579 5426
South Cambie 314 1109 606 759 1529 221 435 2 237 5212
Stanley Park 72 65 246 13 2868 214 74 6 217 3775
Musqueam 17 86 104 1 217 7 40 1 59 532

Crime by Neighbourhood by proportion

In [9]:
neighborhood_crime_percent = round(neighborhood_crime.iloc[:,:9].div(neighborhood_crime['All Crime Count'], axis = 0) * 100, 2)
neighborhood_crime_percent['All Crime Count'] = neighborhood_crime['All Crime Count']
neighborhood_crime_percent.sort_values('All Crime Count')[::-1]
Out[9]:
TYPE Break and Enter Commercial Break and Enter Residential/Other Mischief Other Theft Theft from Vehicle Theft of Bicycle Theft of Vehicle Vehicle Collision or Pedestrian Struck (with Fatality) Vehicle Collision or Pedestrian Struck (with Injury) All Crime Count
NEIGHBOURHOOD
Central Business District 8.45 3.16 15.03 17.35 43.27 6.23 3.62 0.04 2.87 110947
West End 6.71 8.42 12.88 14.59 40.88 7.22 6.43 0.01 2.86 41352
Fairview 10.27 11.92 9.94 10.16 37.11 10.55 6.33 0.04 3.67 32161
Mount Pleasant 9.07 10.73 13.33 12.11 31.70 8.99 8.69 0.06 5.32 30536
Grandview-Woodland 7.66 16.61 18.29 9.23 27.01 5.16 11.45 0.03 4.56 27180
Renfrew-Collingwood 4.47 16.05 14.52 15.39 31.46 1.57 11.25 0.04 5.24 26761
Kitsilano 7.84 16.44 13.83 6.48 33.38 9.23 8.86 0.05 3.90 26699
Kensington-Cedar Cottage 5.12 16.58 15.08 11.87 29.97 3.44 11.70 0.04 6.19 24941
Strathcona 10.36 9.65 21.78 4.75 35.10 4.85 7.89 0.10 5.52 20919
Hastings-Sunrise 5.13 17.65 16.02 7.61 31.19 1.77 13.53 0.10 7.01 18126
Sunset 6.35 14.82 18.64 8.05 30.04 1.47 13.08 0.10 7.45 17396
Marpole 8.39 19.32 14.56 4.68 31.73 1.77 12.36 0.08 7.12 13083
Riley Park 6.77 21.61 14.34 3.27 34.09 4.96 9.56 0.03 5.36 12521
Victoria-Fraserview 3.57 23.10 16.28 4.46 31.33 1.22 12.68 0.09 7.26 10819
Killarney 2.88 20.33 16.81 2.34 38.09 1.56 12.43 0.11 5.44 10475
Oakridge 4.13 25.99 11.06 14.63 28.49 2.14 8.32 0.07 5.15 8037
Dunbar-Southlands 3.80 23.84 17.09 3.11 37.43 3.10 8.12 0.04 3.47 7746
Kerrisdale 4.38 24.52 14.09 3.56 37.67 2.40 7.35 0.09 5.95 7447
Arbutus Ridge 5.36 27.56 15.40 5.56 30.53 2.64 8.21 0.05 4.70 6066
West Point Grey 5.64 22.13 14.97 4.43 33.57 6.34 7.66 0.07 5.20 5871
Shaughnessy 2.38 32.69 11.67 0.46 32.60 2.56 6.84 0.13 10.67 5426
South Cambie 6.02 21.28 11.63 14.56 29.34 4.24 8.35 0.04 4.55 5212
Stanley Park 1.91 1.72 6.52 0.34 75.97 5.67 1.96 0.16 5.75 3775
Musqueam 3.20 16.17 19.55 0.19 40.79 1.32 7.52 0.19 11.09 532

Let me guess which areas are sketchy...

Shaughnessy, Arbutus Ridge, Oakridge, Kerrisdale have a high number of residential break-ins. This indicates to me that either they're just highly residential (and so the only crime that exists is residential break ins). Or that they're just high crime areas.

Fairview, Strathcona, and Mount Pleasant seem like commercial areas that are relatively suss.

Stanley Park, Musqueam seem like nice areas to live in.

Let's look into whether theres a certain time where more crimes occur.

In [10]:
neighborhood_crime_time = pd.pivot_table(data, values = 'HOUR', index = 'NEIGHBOURHOOD', columns = 'TYPE', aggfunc='mean')
neighborhood_crime_time['All Crime Count'] = neighborhood_crime['All Crime Count']
neighborhood_crime_time.sort_values('All Crime Count')[::-1]
Out[10]:
TYPE Break and Enter Commercial Break and Enter Residential/Other Mischief Other Theft Theft from Vehicle Theft of Bicycle Theft of Vehicle Vehicle Collision or Pedestrian Struck (with Fatality) Vehicle Collision or Pedestrian Struck (with Injury) All Crime Count
NEIGHBOURHOOD
Central Business District 10.515847 11.841369 11.609285 14.684837 14.170281 13.890111 14.289841 11.951220 12.857591 110947
West End 10.290811 11.776149 12.966761 15.314935 14.864292 13.937688 15.057143 12.666667 14.000845 41352
Fairview 11.328792 12.679186 13.001564 15.286020 14.659460 13.683559 14.859107 13.785714 13.182203 32161
Mount Pleasant 11.047671 12.360281 12.807617 14.904543 14.494783 14.037509 14.565185 10.500000 13.474138 30536
Grandview-Woodland 11.021134 12.534884 12.871227 14.699362 14.378507 13.660727 14.864995 14.222222 13.530645 27180
Renfrew-Collingwood 11.013367 12.769088 13.259907 15.528041 14.450831 12.844869 14.568914 9.500000 13.607423 26761
Kitsilano 10.365201 12.638497 13.192308 15.247977 14.911355 13.829140 15.367709 12.000000 13.843269 26699
Kensington-Cedar Cottage 10.385278 12.754836 12.978191 14.612969 14.725582 13.853318 14.751627 13.727273 13.775907 24941
Strathcona 11.256919 12.554730 12.341967 14.443662 13.670979 13.573399 13.747273 12.850000 13.383016 20919
Hastings-Sunrise 10.815931 12.956236 13.328168 15.202321 14.577821 13.333333 15.064845 9.777778 13.521260 18126
Sunset 10.478733 12.866951 13.484120 15.287652 14.576732 13.549020 14.795604 15.529412 13.556327 17396
Marpole 11.801457 12.912940 13.685564 14.674837 14.344013 12.978448 14.951763 12.200000 13.636950 13083
Riley Park 10.147406 12.763119 13.138719 14.612195 14.758960 14.127214 15.161236 12.750000 13.618480 12521
Victoria-Fraserview 10.401554 13.505002 13.399205 14.327122 14.448378 14.113636 14.954082 11.800000 13.405852 10819
Killarney 10.192053 12.905164 13.766610 14.702041 14.818546 13.208589 15.479263 11.000000 13.547368 10475
Oakridge 12.942771 13.524653 13.301462 15.287415 14.643231 12.965116 13.973094 15.666667 13.625604 8037
Dunbar-Southlands 10.568027 13.126151 13.589879 15.161826 14.923767 12.970833 16.087440 12.333333 13.676580 7746
Kerrisdale 10.128834 13.032859 13.346044 14.977358 14.415686 13.061453 14.963437 16.142857 12.925508 7447
Arbutus Ridge 10.636923 13.419258 13.688437 15.201780 14.901728 13.293750 15.188755 9.666667 13.463158 6066
West Point Grey 10.734139 12.936875 13.492605 14.842308 14.832065 13.319892 14.484444 10.500000 14.062295 5871
Shaughnessy 11.837209 13.286922 12.865719 16.160000 14.714528 13.302158 14.857143 12.571429 13.081174 5426
South Cambie 11.277070 12.798016 12.648515 16.274045 14.582734 12.583710 15.032184 15.500000 13.776371 5212
Stanley Park 10.069444 13.261538 12.666667 13.538462 13.728382 14.163551 13.662162 15.666667 12.857143 3775
Musqueam 9.764706 11.104651 12.663462 17.000000 14.552995 18.285714 13.525000 0.000000 13.000000 532
In [11]:
neighborhood_crime_time = pd.pivot_table(data, values = 'HOUR', index = 'NEIGHBOURHOOD', columns = 'TYPE', aggfunc='median')
neighborhood_crime_time['All Crime Count'] = neighborhood_crime['All Crime Count']
neighborhood_crime_time.sort_values('All Crime Count')[::-1]
Out[11]:
TYPE Break and Enter Commercial Break and Enter Residential/Other Mischief Other Theft Theft from Vehicle Theft of Bicycle Theft of Vehicle Vehicle Collision or Pedestrian Struck (with Fatality) Vehicle Collision or Pedestrian Struck (with Injury) All Crime Count
NEIGHBOURHOOD
Central Business District 9.0 12.0 12.0 15.0 16.0 15.0 16.0 12.0 14.0 110947
West End 8.0 12.0 15.0 16.0 17.0 16.0 17.0 14.0 15.0 41352
Fairview 12.0 13.0 14.0 16.0 16.0 15.0 17.0 15.0 14.0 32161
Mount Pleasant 11.0 12.0 14.0 15.0 17.0 15.0 17.0 12.5 14.0 30536
Grandview-Woodland 10.0 13.0 15.0 15.0 17.0 15.0 17.0 14.0 15.0 27180
Renfrew-Collingwood 10.0 13.0 15.0 16.0 17.0 13.0 17.0 10.0 14.0 26761
Kitsilano 8.0 13.0 16.0 16.0 17.0 16.0 18.0 12.0 15.0 26699
Kensington-Cedar Cottage 8.0 13.0 15.0 15.0 17.0 15.0 17.0 13.0 15.0 24941
Strathcona 11.0 13.0 13.0 15.0 15.0 15.0 15.0 12.5 14.0 20919
Hastings-Sunrise 9.0 13.0 15.0 15.0 17.0 14.0 18.0 12.0 14.0 18126
Sunset 8.0 13.0 16.0 15.0 17.0 14.0 17.0 17.0 14.0 17396
Marpole 13.0 13.0 16.0 15.0 17.0 15.0 17.0 14.0 14.0 13083
Riley Park 7.0 13.0 15.0 15.0 17.0 16.0 18.0 14.5 14.0 12521
Victoria-Fraserview 8.0 14.0 16.0 15.0 17.0 16.0 18.0 15.0 14.0 10819
Killarney 6.0 13.0 16.0 15.0 17.0 14.0 18.0 12.5 15.0 10475
Oakridge 16.0 14.0 15.0 15.0 17.0 13.0 15.0 17.0 14.0 8037
Dunbar-Southlands 7.0 14.0 17.0 15.0 18.0 14.0 19.0 10.0 15.0 7746
Kerrisdale 7.0 14.0 16.0 16.0 17.0 14.0 17.0 16.0 13.0 7447
Arbutus Ridge 9.0 14.0 16.0 15.0 17.0 14.0 18.0 8.0 14.0 6066
West Point Grey 11.0 13.0 16.0 15.0 17.0 14.0 17.0 11.0 15.0 5871
Shaughnessy 15.0 14.0 15.0 16.0 18.0 15.0 17.0 13.0 14.0 5426
South Cambie 13.0 13.0 14.0 16.0 17.0 13.0 17.0 15.5 15.0 5212
Stanley Park 8.5 15.0 13.0 14.0 14.0 14.0 14.0 17.0 13.0 3775
Musqueam 10.0 11.0 14.0 17.0 18.0 21.0 14.5 0.0 15.0 532
In [12]:
df = data[['HOUR', 'NEIGHBOURHOOD']]
In [13]:
f, axes = plt.subplots(2, 2, figsize=(7, 7), sharex=True)
sns.distplot( df[df.NEIGHBOURHOOD == "Central Business District"].HOUR , color="skyblue", ax=axes[0, 0])
sns.distplot( df[df.NEIGHBOURHOOD == "West End"].HOUR , color="olive", ax=axes[0, 1])
sns.distplot( df[df.NEIGHBOURHOOD == "Fairview"].HOUR , color="gold", ax=axes[1, 0])
sns.distplot( df[df.NEIGHBOURHOOD == "Mount Pleasant"].HOUR , color="teal", ax=axes[1, 1])
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1fe0be10>

Looks like crime follows a similar path. It peaks at around 17-18 which is like 5-6PM, and there's also another peak at midnight.

Let's try to put crime density on the map. Looks like some of latitude longitude data isn't right or at least is really far from vancouver so I cleaned it out.

In [14]:
cleaned = data[data.Longitude != 0]
print(cleaned.shape)
cleaned = cleaned[cleaned.Longitude > -123.797726]
print(cleaned.shape)
cleaned = cleaned[cleaned.Latitude > 49.118175]
print(cleaned.shape)

cleaned.dtypes
(476290, 13)
(476288, 13)
(476287, 13)
Out[14]:
TYPE              object
YEAR               int64
MONTH              int64
DAY                int64
HOUR             float64
MINUTE           float64
HUNDRED_BLOCK     object
NEIGHBOURHOOD     object
X                float64
Y                float64
Latitude         float64
Longitude        float64
counter            int64
dtype: object
In [15]:
cleaned.TYPE = cleaned.TYPE.astype('category').cat.codes
cleaned.NEIGHBOURHOOD = cleaned.NEIGHBOURHOOD.astype('category').cat.codes
In [16]:
print(min(cleaned.Latitude), ", ", min(cleaned.Longitude))
print(max(cleaned.Latitude), ", ", max(cleaned.Longitude))
49.20089685 ,  -123.223955
49.31334872 ,  -122.84459740000001
In [17]:
from matplotlib.pyplot import figure
figure(num=None, figsize=(6.13,2.75), dpi=100, facecolor='w', edgecolor='k')
img = plt.imread('/Users/danielluo/data/vancouverpic2.png', 0)
plt.imshow(img, zorder=0, aspect = 'auto',extent=[-123.223955, -122.8445974, 49.20089685,49.31334872])

sample = cleaned.sample(1000)

plt.scatter(sample['Longitude'],sample['Latitude'], c = sample['TYPE'], s = 5)
plt.colorbar()
plt.title("Crime by Location")
plt.show()
In [18]:
figure(num=None, figsize=(6.13,2.75), dpi=100, facecolor='w', edgecolor='k')
img = plt.imread('/Users/danielluo/data/vancouverpic2.png', 0)
plt.imshow(img, zorder=0, aspect = 'auto',extent=[-123.223955, -122.8445974, 49.20089685,49.31334872])

sample = cleaned.sample(1000)

plt.scatter(sample['Longitude'],sample['Latitude'], c = sample['NEIGHBOURHOOD'], s = 5)
plt.colorbar()
plt.title("Crime by Location")
plt.show()
In [19]:
neighbourhoods_alpha = pd.Series(pd.unique(data.NEIGHBOURHOOD)).sort_values().reset_index().iloc[:,1]
neighbourhoods_alpha
Out[19]:
0                 Arbutus Ridge
1     Central Business District
2             Dunbar-Southlands
3                      Fairview
4            Grandview-Woodland
5              Hastings-Sunrise
6      Kensington-Cedar Cottage
7                    Kerrisdale
8                     Killarney
9                     Kitsilano
10                      Marpole
11               Mount Pleasant
12                     Musqueam
13                     Oakridge
14          Renfrew-Collingwood
15                   Riley Park
16                  Shaughnessy
17                 South Cambie
18                 Stanley Park
19                   Strathcona
20                       Sunset
21          Victoria-Fraserview
22                     West End
23              West Point Grey
24                          NaN
Name: 0, dtype: object

I needed to find the area of each neighbourhood. Unfortunately I ended just finding them one by one so it was a bit challenging but it wasn't that bad. Courtesy of the vancouver website

In [20]:
neighbourhood_areas = pd.Series([370,
370,
856,
327,
445,
793,
724,
631,
664,
546,
559,
366,
125,
401,
805,
491,
446,
217,
405,
388,
626,
531,
198,
445,
0])

neighbourhood_areas.replace(0, np.NaN)
neighbourhood_areas
Out[20]:
0     370
1     370
2     856
3     327
4     445
5     793
6     724
7     631
8     664
9     546
10    559
11    366
12    125
13    401
14    805
15    491
16    446
17    217
18    405
19    388
20    626
21    531
22    198
23    445
24      0
dtype: int64
In [21]:
area_neighbourhood = pd.concat([neighbourhoods_alpha, neighbourhood_areas], axis=1, keys = ['NEIGHBOURHOOD', 'AREA']).set_index('NEIGHBOURHOOD')
area_neighbourhood
Out[21]:
AREA
NEIGHBOURHOOD
Arbutus Ridge 370
Central Business District 370
Dunbar-Southlands 856
Fairview 327
Grandview-Woodland 445
Hastings-Sunrise 793
Kensington-Cedar Cottage 724
Kerrisdale 631
Killarney 664
Kitsilano 546
Marpole 559
Mount Pleasant 366
Musqueam 125
Oakridge 401
Renfrew-Collingwood 805
Riley Park 491
Shaughnessy 446
South Cambie 217
Stanley Park 405
Strathcona 388
Sunset 626
Victoria-Fraserview 531
West End 198
West Point Grey 445
NaN 0

Crime Density in Number of Crimes per Hectare

In [22]:
neighbourhood_crime_area = neighborhood_crime.join(area_neighbourhood)
neighbourhood_crime_area['density'] = neighbourhood_crime_area['All Crime Count'] / neighbourhood_crime_area.AREA
neighbourhood_crime_area.sort_values('density')[::-1]
Out[22]:
Break and Enter Commercial Break and Enter Residential/Other Mischief Other Theft Theft from Vehicle Theft of Bicycle Theft of Vehicle Vehicle Collision or Pedestrian Struck (with Fatality) Vehicle Collision or Pedestrian Struck (with Injury) All Crime Count AREA density
NEIGHBOURHOOD
Central Business District 9371 3505 16672 19244 48003 6907 4016 41 3188 110947 370 299.856757
West End 2775 3480 5325 6033 16904 2985 2660 6 1184 41352 198 208.848485
Fairview 3303 3834 3196 3269 11934 3394 2037 14 1180 32161 327 98.351682
Mount Pleasant 2769 3278 4070 3698 9679 2746 2654 18 1624 30536 366 83.431694
Grandview-Woodland 2082 4515 4970 2508 7342 1403 3111 9 1240 27180 445 61.078652
Strathcona 2168 2019 4556 994 7343 1015 1650 20 1154 20919 388 53.914948
Kitsilano 2092 4390 3692 1730 8912 2464 2366 13 1040 26699 546 48.899267
Kensington-Cedar Cottage 1277 4136 3760 2961 7474 859 2919 11 1544 24941 724 34.448895
Renfrew-Collingwood 1197 4296 3886 4119 8420 419 3011 12 1401 26761 805 33.243478
Sunset 1105 2578 3243 1401 5226 255 2275 17 1296 17396 626 27.789137
Riley Park 848 2706 1795 410 4269 621 1197 4 671 12521 491 25.501018
South Cambie 314 1109 606 759 1529 221 435 2 237 5212 217 24.018433
Marpole 1098 2527 1905 612 4151 232 1617 10 931 13083 559 23.404293
Hastings-Sunrise 929 3199 2904 1379 5654 321 2452 18 1270 18126 793 22.857503
Victoria-Fraserview 386 2499 1761 483 3390 132 1372 10 786 10819 531 20.374765
Oakridge 332 2089 889 1176 2290 172 669 6 414 8037 401 20.042394
Arbutus Ridge 325 1672 934 337 1852 160 498 3 285 6066 370 16.394595
Killarney 302 2130 1761 245 3990 163 1302 12 570 10475 664 15.775602
West Point Grey 331 1299 879 260 1971 372 450 4 305 5871 445 13.193258
Shaughnessy 129 1774 633 25 1769 139 371 7 579 5426 446 12.165919
Kerrisdale 326 1826 1049 265 2805 179 547 7 443 7447 631 11.801902
Stanley Park 72 65 246 13 2868 214 74 6 217 3775 405 9.320988
Dunbar-Southlands 294 1847 1324 241 2899 240 629 3 269 7746 856 9.049065
Musqueam 17 86 104 1 217 7 40 1 59 532 125 4.256000

The most dense crime is Central Business District and West End by a large margin. These are likely high density areas with a lot of people in a small area. Maybe we should look at it in terms of population too. That a lot work tho

In [23]:
# Central Business District
downtown = data[data.NEIGHBOURHOOD == 'Central Business District'].groupby('YEAR').agg('count').iloc[:,11]
strathcona = data[data.NEIGHBOURHOOD == 'Strathcona'].groupby('YEAR').agg('count').iloc[:,11]
mt_pleasant = data[data.NEIGHBOURHOOD == 'Mount Pleasant'].groupby('YEAR').agg('count').iloc[:,11]
In [24]:
ax = downtown.plot(label = 'Downtown')
strathcona.plot(label = "Strathcona", ax=ax)
mt_pleasant.plot(label = 'Mt Pleasant', ax=ax)
plt.legend()
plt.title('Number of Crimes from 2003 to 2017')
plt.show()

Why is there such a high rise in crime in 2016.

Let's see if this trend is true for the rest of the neighbourhoods. I'm thinking maybe there was a rise in population?

In [25]:
neighbourhood_time_crime = pd.pivot_table(data, values = 'counter', index = 'YEAR', columns = 'NEIGHBOURHOOD', aggfunc = 'sum')
ax = neighbourhood_time_crime.plot(title = 'Crime Count by Years')
ax.get_legend().set_bbox_to_anchor((1, 1))
In [26]:
area_neighbourhood
Out[26]:
AREA
NEIGHBOURHOOD
Arbutus Ridge 370
Central Business District 370
Dunbar-Southlands 856
Fairview 327
Grandview-Woodland 445
Hastings-Sunrise 793
Kensington-Cedar Cottage 724
Kerrisdale 631
Killarney 664
Kitsilano 546
Marpole 559
Mount Pleasant 366
Musqueam 125
Oakridge 401
Renfrew-Collingwood 805
Riley Park 491
Shaughnessy 446
South Cambie 217
Stanley Park 405
Strathcona 388
Sunset 626
Victoria-Fraserview 531
West End 198
West Point Grey 445
NaN 0
In [27]:
area_neighbourhood = area_neighbourhood.drop(area_neighbourhood.tail(1).index)
area_neighbourhood
Out[27]:
AREA
NEIGHBOURHOOD
Arbutus Ridge 370
Central Business District 370
Dunbar-Southlands 856
Fairview 327
Grandview-Woodland 445
Hastings-Sunrise 793
Kensington-Cedar Cottage 724
Kerrisdale 631
Killarney 664
Kitsilano 546
Marpole 559
Mount Pleasant 366
Musqueam 125
Oakridge 401
Renfrew-Collingwood 805
Riley Park 491
Shaughnessy 446
South Cambie 217
Stanley Park 405
Strathcona 388
Sunset 626
Victoria-Fraserview 531
West End 198
West Point Grey 445
In [28]:
area_neighbourhood
Out[28]:
AREA
NEIGHBOURHOOD
Arbutus Ridge 370
Central Business District 370
Dunbar-Southlands 856
Fairview 327
Grandview-Woodland 445
Hastings-Sunrise 793
Kensington-Cedar Cottage 724
Kerrisdale 631
Killarney 664
Kitsilano 546
Marpole 559
Mount Pleasant 366
Musqueam 125
Oakridge 401
Renfrew-Collingwood 805
Riley Park 491
Shaughnessy 446
South Cambie 217
Stanley Park 405
Strathcona 388
Sunset 626
Victoria-Fraserview 531
West End 198
West Point Grey 445
In [29]:
time_density_neighbourhood = pd.pivot_table(data, values = 'counter', index = 'NEIGHBOURHOOD', columns = 'YEAR', aggfunc = 'sum').divide(area_neighbourhood.AREA, axis=0).T
ax = time_density_neighbourhood.plot(title="Crime Density through the Years")
ax.get_legend().set_bbox_to_anchor((1, 1))
In [30]:
data[data.NEIGHBOURHOOD == "Central Business District"]
Out[30]:
TYPE YEAR MONTH DAY HOUR MINUTE HUNDRED_BLOCK NEIGHBOURHOOD X Y Latitude Longitude counter
20 Other Theft 2003 4 30 13.0 6.0 9XX SEYMOUR ST Central Business District 491205.19 5458520.26 49.279374 -123.120920 1
21 Other Theft 2003 12 12 15.0 50.0 9XX SEYMOUR ST Central Business District 491143.26 5458445.58 49.278701 -123.121770 1
22 Other Theft 2003 3 7 16.0 15.0 9XX ROBSON ST Central Business District 491132.15 5458889.26 49.282692 -123.121932 1
30 Theft from Vehicle 2003 6 17 16.0 15.0 11XX MAINLAND ST Central Business District 491144.35 5458003.10 49.274721 -123.121745 1
36 Theft from Vehicle 2003 6 13 12.0 0.0 11XX MAINLAND ST Central Business District 491144.35 5458003.10 49.274721 -123.121745 1
... ... ... ... ... ... ... ... ... ... ... ... ... ...
530623 Other Theft 2017 5 3 16.0 7.0 11XX HOMER ST Central Business District 491129.41 5458150.88 49.276050 -123.121954 1
530635 Other Theft 2017 3 12 15.0 7.0 3XX ABBOTT ST Central Business District 492219.19 5458881.35 49.282636 -123.106985 1
530637 Theft from Vehicle 2017 6 19 8.0 30.0 HOWE ST / W CORDOVA ST Central Business District 491719.49 5459323.80 49.286609 -123.113865 1
530643 Theft from Vehicle 2017 2 9 22.0 0.0 HOWE ST / W CORDOVA ST Central Business District 491719.49 5459323.80 49.286609 -123.113865 1
530650 Theft from Vehicle 2017 6 5 17.0 0.0 8XX HAMILTON ST Central Business District 491487.85 5458385.78 49.278168 -123.117031 1

110947 rows × 13 columns

There is a news article that blames the increase of immigration from Alberta. Is this lowkey racism?

https://www.vancourier.com/news/vpd-sees-10-year-spike-in-break-ins-to-cars-businesses-1.2370931

In [31]:
pd.unique(data.TYPE)
Out[31]:
array(['Other Theft', 'Break and Enter Residential/Other', 'Mischief',
       'Break and Enter Commercial', 'Offence Against a Person',
       'Theft from Vehicle',
       'Vehicle Collision or Pedestrian Struck (with Injury)',
       'Vehicle Collision or Pedestrian Struck (with Fatality)',
       'Theft of Vehicle', 'Homicide', 'Theft of Bicycle'], dtype=object)

What is the trend in the types of crime per neighbour hood

Break into serious crimes and non serious crimes

In [32]:
data
Out[32]:
TYPE YEAR MONTH DAY HOUR MINUTE HUNDRED_BLOCK NEIGHBOURHOOD X Y Latitude Longitude counter
0 Other Theft 2003 5 12 16.0 15.0 9XX TERMINAL AVE Strathcona 493906.50 5457452.47 49.269802 -123.083763 1
1 Other Theft 2003 5 7 15.0 20.0 9XX TERMINAL AVE Strathcona 493906.50 5457452.47 49.269802 -123.083763 1
2 Other Theft 2003 4 23 16.0 40.0 9XX TERMINAL AVE Strathcona 493906.50 5457452.47 49.269802 -123.083763 1
3 Other Theft 2003 4 20 11.0 15.0 9XX TERMINAL AVE Strathcona 493906.50 5457452.47 49.269802 -123.083763 1
4 Other Theft 2003 4 12 17.0 45.0 9XX TERMINAL AVE Strathcona 493906.50 5457452.47 49.269802 -123.083763 1
... ... ... ... ... ... ... ... ... ... ... ... ... ...
530647 Break and Enter Residential/Other 2017 3 3 9.0 16.0 31XX ADANAC ST Hastings-Sunrise 497265.49 5458296.71 49.277420 -123.037595 1
530648 Mischief 2017 5 29 22.0 30.0 14XX E 7TH AVE Grandview-Woodland 494533.97 5456824.97 49.264163 -123.075129 1
530649 Offence Against a Person 2017 4 13 NaN NaN OFFSET TO PROTECT PRIVACY NaN 0.00 0.00 0.000000 0.000000 1
530650 Theft from Vehicle 2017 6 5 17.0 0.0 8XX HAMILTON ST Central Business District 491487.85 5458385.78 49.278168 -123.117031 1
530651 Vehicle Collision or Pedestrian Struck (with I... 2017 6 6 17.0 38.0 13XX BLOCK PARK DR Marpole 490204.00 5451444.00 49.215706 -123.134512 1

530652 rows × 13 columns

Serious Crimes

'Break and Enter Residential/Other', 'Mischief', 'Break and Enter Commercial'

In [33]:
serious_crimes = ['Other Theft', 'Break and Enter Residential/Other', 'Mischief',
       'Break and Enter Commercial']

serious_crime_data = data[data.TYPE.isin(serious_crimes)]

time_density_neighbourhood = pd.pivot_table(serious_crime_data, values = 'counter', index = 'NEIGHBOURHOOD', columns = 'YEAR', aggfunc = 'sum').divide(area_neighbourhood.AREA, axis=0).T
ax = time_density_neighbourhood.plot(title="Crime Density through the Years")
ax.get_legend().set_bbox_to_anchor((1, 1))
In [34]:
def plot_density(data, crime_type):
    subset = data[data.TYPE == crime_type]
    time_density_neighbourhood = pd.pivot_table(subset, values = 'counter', index = 'NEIGHBOURHOOD', columns = 'YEAR', aggfunc = 'sum').divide(area_neighbourhood.AREA, axis=0).T
    ax = time_density_neighbourhood.plot(title="Crime Density through the Years: " + str(crime_type))
    ax.get_legend().set_bbox_to_anchor((1, 1))
    
In [35]:
plot_density(data, 'Other Theft')
In [36]:
pd.unique(data.TYPE) # you don't want Homicide or Offence Against a Person
crimes = pd.unique(data.TYPE)
crimes
Out[36]:
array(['Other Theft', 'Break and Enter Residential/Other', 'Mischief',
       'Break and Enter Commercial', 'Offence Against a Person',
       'Theft from Vehicle',
       'Vehicle Collision or Pedestrian Struck (with Injury)',
       'Vehicle Collision or Pedestrian Struck (with Fatality)',
       'Theft of Vehicle', 'Homicide', 'Theft of Bicycle'], dtype=object)

Crime Density by Year by Crime

In [37]:
known_crimes = crimes[crimes != 'Homicide']
known_crimes = known_crimes[known_crimes != 'Offence Against a Person']
known_crimes # there are 9 crimes

for crime in known_crimes:
    plot_density(data, crime)
    

The amount of Break and Enter Residential and Vehicle Theft have all gone down consistently.

Theft of Bike, Other Theft, Break and Enter Commercial are rising.

There has ultimately been rise in crime recently peaking in 2016. This is quite strange.

In [ ]:
 

Let's do it by month

Above was by year, now I'll do it by year. Looking at the trends in crimes with each month and hour day, etc.

In [38]:
def plot_density_month(data, crime_type):
    subset = data[data.TYPE == crime_type]
    time_density_neighbourhood = pd.pivot_table(subset, values = 'counter', index = 'NEIGHBOURHOOD', columns = 'MONTH', aggfunc = 'sum').divide(area_neighbourhood.AREA, axis=0).T
    ax = time_density_neighbourhood.plot(title="Crime Density through the Years: " + str(crime_type))
    ax.get_legend().set_bbox_to_anchor((1, 1))
In [39]:
for crime in known_crimes:
    plot_density_month(data, crime)

Most of the crimes are consistent throughout the year, except bike theft. Bike theft rises in the summer months which makes sense. People bike more and desire bikes more in the summer.

In [40]:
def plot_density(data, crime_type, time_interval):
    subset = data[data.TYPE == crime_type]
    time_density_neighbourhood = pd.pivot_table(subset, values = 'counter', index = 'NEIGHBOURHOOD', columns = time_interval, aggfunc = 'sum').divide(area_neighbourhood.AREA, axis=0).T
    ax = time_density_neighbourhood.plot(title="Crime Density through the Years: " + str(crime_type))
    ax.get_legend().set_bbox_to_anchor((1, 1))

This is by Day

In [41]:
for crime in known_crimes:
    plot_density(data, crime, 'DAY')

Fairly constant. There's a spike in Break and Enter Commercial, and Mischief on the 15th which is strange. It's the Stanley Cup Riot in 2011 lol. It was one June 15th.

Last by Hour

In [42]:
for crime in known_crimes:
    plot_density(data, crime, 'HOUR')

People steal things during the day between 10 and 20o'clock. Mischief goes through the night. Car Jacking occurs at night.

I think something important to note is that this is when reports occur, not necessarily when the actual crime occured.